Maintenance of a persistent master identifier for clusters of user identifiers across a plurality of devices

ABSTRACT

A method for maintaining a persistent master identifier includes receiving a first plurality of clusters including a first plurality of user device identifiers and at least one first cluster attribute associated with the first plurality of identifiers. The first plurality of user device identifiers is stored as separate entries in a master table. Each entry includes at least a persistent master identifier, a user device identifier contained within the identified cluster, and the at least one attribute associated with the identifier. A second plurality of clusters including a second plurality of user device identifiers and at least one attribute associated with the second plurality of identifiers is received. Persistence of each entry in the master table is determined based on a comparison between the second plurality of clusters and the master table. The second plurality of clusters is selectively associated with a persistent master identifier based on the comparison.

FIELD OF THE INVENTION

This specification is directed, in general, to an information processingsystem, and, more particularly, to maintenance of a persistent masteridentifier for clusters of user identifiers across a plurality ofelectronic devices.

BACKGROUND OF THE INVENTION

Consumer electronic devices, such as desk-based and laptop computers,mobile phones, laptop computers, notebooks, tablets, MP3 players,connected TVs, etc., are ubiquitous. Part of the reason for the rapidgrowth in the number of mobile phones and other electronic devices isthe rapid pace at which these devices evolve. More and more people areusing multiple devices to access the internet. Through these devicesthey use browsers, apps or other methods to access content, interactiveservices and to communicate. Companies providing content can identifyand track several user data points, such as the actual IP address,headers for webpage request and response, user's browsing history andvarious user device identifiers.

Typically, these user device identifiers are different across thevarious environments. In other words, one user may have many differentuser device identifiers also referred to herein as simply ‘UIDs’. TheseUIDs are not constructed to remain constant and they have a certain‘lifespan’ from less than a second to weeks or months. There is a needto create applications capable of recognizing the user as one individualperson across devices, websites and applications. This problem is mostpressing in the online advertising industry, where various applicationsfocused on providing, information related to the reach and frequency ofa digital campaign are unable to provide accurate advertising metrics,optimizations and measurements without a cross-device view of the user.

SUMMARY OF THE INVENTION

The purpose and advantages of the below described illustratedembodiments will be set forth in and apparent from the description thatfollows. Additional advantages of the illustrated embodiments will berealized and attained by the devices, systems and methods particularlypointed out in the written description and claims hereof as well as fromthe appended drawings.

To achieve these and other advantages and in accordance with the purposeof the illustrated embodiments, in one aspect, a method and system formaintaining a persistent master identifier across a plurality of devicesis provided. The method includes receiving a first plurality of clustersincluding a first plurality of user device identifiers that identifiesat least one user device, and at least one first cluster attributeassociated with each of the first plurality of user device identifiers.The first plurality of user device identifiers is stored as separateentries in a master table. Each entry in the master table includes atleast a persistent master identifier uniquely identifying one of theplurality of clusters, a user device identifier contained within theidentified cluster, and the at least one attribute associated with theuser device identifier. A second plurality of clusters including asecond plurality of user device identifiers that identifies at least oneuser device and at least one second cluster attribute associated witheach of the second plurality of user device identifiers is received.Persistence of each entry in the master table is determined based on atleast one comparison between the second plurality of clusters and theentries in the master table. The second plurality of clusters isselectively associated with a persistent master identifier based on thecomparison.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying appendices and/or drawings illustrate various,non-limiting, examples, inventive aspects in accordance with the presentdisclosure:

FIG. 1 is a block diagram illustrating an environment in whichembodiments of the present invention may be practiced;

FIG. 2 is a block diagram illustrating the MatchID analysis system ofFIG. 1, according to one embodiment of the present invention;

FIG. 3 is a flowchart illustrating an example of a flow of processingperformed by the MatchID analysis system of FIG. 1, in accordance withillustrative embodiments of the present invention;

FIG. 4 is an example of two different clusters of user identifiersassociated with two different instances in time, in accordance withillustrative embodiments of the present invention;

FIG. 5 is an example illustrating association of a persistent masteridentifier with a candidate cluster in a case where there is only onecandidate cluster with overlapping UIDs, in accordance with illustrativeembodiments of the present invention;

FIG. 6 is an example illustrating association of a persistent masteridentifier with a candidate cluster in a case where there is more thanone candidate cluster with overlapping UIDs and only one candidatecluster having the highest number of overlapping UIDs, in accordancewith illustrative embodiments of the present invention;

FIG. 7 is an example illustrating association of a persistent masteridentifier with a candidate cluster in a case where there is more thanone candidate cluster with overlapping UIDs and more than one candidatecluster having the highest number of overlapping UIDs, in accordancewith illustrative embodiments of the present invention; and

FIG. 8 is an example illustrating association of a persistent masteridentifier with a candidate cluster based on matching attributesassociated with UIDs in a case where there are no overlapping UIDs, inaccordance with illustrative embodiments of the present invention.

In the drawings like characters of reference indicate correspondingparts in the different figures. The drawing figures, elements and otherdepictions should be understood as being interchangeable and may becombined in any like manner in accordance with the disclosures andobjectives recited herein.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

The illustrated embodiments described herein are merely exemplary of theinvention, which can be embodied in various forms, as appreciated by oneskilled in the art. Therefore, it is to be understood that anystructural and functional details disclosed herein are not to beinterpreted as limiting, but merely as a basis for the claims and as arepresentative for teaching one skilled in the art. Furthermore, theterms and phrases used herein are not intended to be limiting but ratherto provide an understandable description of the invention.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Any methods and materialssimilar or equivalent to those described herein may also be used topractice or test the instant disclosures and those inherent to the same.

It must be noted that as used herein and in the appended claims, thesingular forms “a”, “an,” and “the” include plural referents unless thecontext clearly dictates otherwise. Thus, for example, reference to “astimulus” includes a plurality of such stimuli and reference to “thesignal” includes reference to one or more signals and equivalentsthereof known to those skilled in the art, and so forth.

It is to be appreciated the embodiments of this invention as discussedbelow may preferably be implemented as a software algorithm, program orcode residing on computer useable medium having control logic forenabling execution on a machine having a computer processor. Such amachine typically includes memory storage configured to provide outputfrom execution of the computer algorithm or program.

As used herein, the term “software” is meant to be synonymous with anycode or program that can be in a processor of a host computer,regardless of whether the implementation is in hardware, firmware or asa software computer product available on a disc, a memory storagedevice, or for download from a remote machine. The embodiments describedherein may include such software to implement the equations,relationships, and algorithms described below. One skilled in the artwill appreciate further features and advantages of the invention basedon the above-described embodiments. Accordingly, the invention is not tobe limited by what has been particularly shown and described, except asindicated by the appended claims.

In exemplary embodiments, a computer system component may constitute a“module” that is configured and operates to perform certain operationsas described herein below. Accordingly, the term “module” should beunderstood to encompass a tangible entity, be that an entity that isphysically constructed, permanently configured (e.g., hardwired) ortemporarily configured (e.g. programmed) to operate in a certain mannerand to perform certain operations described herein.

The required architecture for a variety of these systems will appearfrom the description below. In addition, the exemplary embodiments maybe implemented via any particular programming language suitable for useby those skilled in the art.

In addition, the language used in the specification has been principallyselected for readability and instructional purposes, and may not havebeen selected to delineate or circumscribe the inventive subject matter.Accordingly, the instant disclosures are intended to be illustrative,but not limiting, of the scope of the invention, which is set forth inthe following claims.

As noted above, there is a need to create applications capable ofrecognizing the user as one individual person across devices, websitesand applications. To solve this problem, companies have therefore begunlinking UIDs across environments, creating clusters of UIDs that (arelikely to) belong to the same person. Clusters of matched UIDs may getassigned a unique Master Identifier that is used in systems as across-environment substitute ID for the UIDs created on aper-environment basis. However, UIDs are not static and do vary overtime. UIDs may vary (e.g., new UIDs may appear and others may get lostover time), for example, due to cookie expiration or deletion, device IDresets and/or newly purchased devices. These continuous changes make itchallenging to create a master identifier that is persistent and that isassociated with the same person, while the related UIDs are changingover time.

One or more of the inventive embodiments relate to a method, system,and/or computer product for maintaining a persistent master identifieracross a plurality of devices. In one embodiment, the method facilitatestracking of a persistent master identifier for volatile or distinctclusters of user identifiers that are attributed to belong to the sameperson. Further, in other embodiments, the method facilitates trackinguser interactions with various systems/platforms across a plurality ofdevices and stores them in a separate database. Thereafter, the methodcompares different clusters that may be associated with a particularuser and selectively assigns a persistent master identifier, based onthe comparison.

FIG. 1 is a block diagram illustrating an environment 100 in whichembodiments of the present invention may be practiced. Environment 100may include one or more users such as a user 102; a browser 104; aMatchID analysis system 106; a network 108; an analysis data repository110; and one or more servers 112 such as a merchant server 112 a, asocial network server 112 b, a content 112 c, and a bank server 112 d.Servers 112 provide various details associated with user identifiers toMatchID analysis system 106, as described below. Browser 104 includes abrowser application 114. The browser application 114 may execute on acomputing device 116. The computing device 116 may be, for example, alaptop computer, desktop computer, ultrabook, tablet computer, mobiledevice, smart phone, smart TV, or server, among others. Browserapplication 114 can include executable file that runs outside browser104 in computing device 116. The term “browser application” as usedherein, includes, but is not limited to, mobile applications that run onsmart phones, tablet computers, and other mobile or portable computingdevices. Browser application 114 may allow user 102 to connect toservices traditionally available on desktop or wireless platforms.Typically, these services access the Internet or intranet or cellular orwireless fidelity (Wi-Fi) networks, to access, retrieve, transmit andshare data.

In various embodiments of the present invention, user 102 may usebrowser 104 to access one or more servers 112 via browser application114 through network 108. Examples of browser 104 may include, but arenot limited to, Microsoft® Internet Explorer, Mozilla Firefox®, AppleSafari®, Google® Chrome, and Opera®. Examples of network 108 may includewired network and wireless network. In some embodiments, user 102 mayuse browser 104 to shop for products online.

In an embodiment of the invention, user 102 is registered with each ofthe servers 112. One or more servers 112 such as a merchant server 112a, a social network server 112 b, a content server 112 c, and a bankserver 112 d may be located remotely from user 102.

Even though only one computing device 116 is shown in FIG. 1, users 102may own more than one device. For example, user 102 may own a laptop anda smartphone. When the different computing devices 116, applications,software, software modules and/or other components access content (e.g.,websites, services, and/or locations) online, one or more user deviceidentifiers may be recorded, captured, and/or stored by servers 112 thathost the content. For example, when a user 102 access a news websiteusing a web browser application 114 on a smartphone 116, the servers 112that host content for the news website may record the IP address of thecomputing device 116, other website header information (e.g., HTTPheader information/fields), and an identifier for the web browserapplication (e.g., an identifier in a cookie for the website such as acookie ID.

MatchID analysis system 106 interacts with various servers 112 tocollect various information about a user/device, dynamically analyzesthe information and facilitates tracking of a persistent masteridentifier (also referred to hereinafter as MatchID) for volatile ordistinct clusters of user identifiers. In other words, the MatchIDanalysis system 106 is configured and operable to determine ifparticular user identifiers belong to the same person. Thousands ofvarious data points may be compared by the MatchID analysis system 106to finally come up with a link that particular UIDs belong to the sameuser.

In an embodiment, MatchID analysis system 106 interacts with servers 112either continuously or at pre-defined intervals of time to retrieveinformation regarding one or more users 102. The information mayinclude, but is not limited to, IP address and user ID/device IDassociated with user 102, user ID type, user activity information (e.g.,time information related to a particular event), and user agentinformation. In an embodiment of the invention, the pre-defined intervalof time may be an hour, a day, a week, or longer. In an embodiment ofthe invention, MatchID analysis system 106 may automatically receivedata feeds from various servers such as 112 a, 112 b, and 112 c.

MatchID analysis system 106 stores the retrieved information (e.g.,clusters of user identifiers) in analysis data repository 110. Datarepository 110 may include one or more databases, such as relationaldatabases.

As user 102 performs an activity on browser 104, browser application 114may track the activity. In one embodiment, browser application 114tracks the activity based on the context of browser 104. The trackedactivity is sent to one or more servers 112 via network 108. Examples ofthe context of browser 104 may include, but are not limited to, thewebsite Uniform Resource Locator (URL), the website, the content of awebpage, a search query, the configuration of the browser, andconfiguration of the computing device associated with a user, such as IPaddress, type of operating system, type of computing device, etc.facilitating user's interactions with one or more servers 112.

Further, tracking of cross-environment user identifiers are explained ingreater detail below.

FIG. 2 is a block diagram illustrating the MatchID analysis system ofFIG. 1, according to one embodiment of the present invention. MatchIDanalysis system 106 may include one or more software modules. Thesoftware modules may comprise a software program or set of instructionsexecuted by a processor. According to the illustrative embodiment ofFIG. 2, the software modules may include a data acquisition module 200,a clustering comparison execution module 202, a persistent master IDdetermination module 204, a data output module 206 and/or a data storagemodule 208.

According to this embodiment, the data acquisition module 200 may begenerally configured and enabled to receive or extract user data relatedto user identifiers. Such data and information can be received/retrievedfrom each server 112 periodically, for example, at predetermined periodsof time. As noted above, the received/retrieved information may include,for example, IP address and user ID associated with user 102, user IDtype, various attributes and heuristics information associated with userIDs, user activity information (e.g., time information related to aparticular event), user agent profile information and the like. Thereceived information is then sent to the data storage module 208 or toclustering comparison execution module 202. As described below, theclustering comparison execution module 202 may be generally configuredand enabled to compare each of the received candidate clusters with aplurality of previously received clusters that are stored in a mastertable to find matching ones. Next, the clustering comparison executionmodule 202 passes results of such comparison to the persistent master IDdetermination module 204. In an embodiment, the persistent master IDdetermination module 204 determines a degree of overlap of useridentifiers or attributes between matching clusters for all matchingcombinations of stored clusters and candidate clusters. The persistentmaster ID determination module 204 may be generally configured andenabled to selectively assign a persistent master identifier to thecandidate cluster having the highest degree of overlap with acorresponding cluster stored in the master table, as described below.According to another exemplary embodiment, when all data received by thedata acquisition module 200 is processed, the data output module 206 maybe generally configured and operable to output the result data back toone or more servers 112, as described below. The data storage module 208may be implemented mainly by the data repository 110 operativelyconnected to the MatchID analysis system 110.

According to the illustrative embodiment of FIG. 3, a method formonitoring performance of customized applications in a computer networkmay be depicted in diagram form. Before turning to description of FIG.3, it is noted that the flow diagrams shown therein are described, byway of example, with reference to components shown in FIGS. 1 and 2,although these operational steps may be carried out in any system andare not limited to the scenario shown in the aforementioned figure.Additionally, while an illustrative flow diagram, such as thatillustrated by in the FIG. 3 embodiment, may show operational stepscarried out in a particular order, as indicated by the lines connectingthe blocks, the various steps shown in these diagrams can be performedin any combination or sub-combination consistent with the disclosuresprovided herein. It should be appreciated that in some embodiments someof the steps described below may be combined into a single step. In someembodiments, one or more additional steps may be included.

Initially, at step 302, the MatchID analysis system 106 (e.g., the dataacquisition module 200) receives an initial list of clusters of useridentifiers. In one embodiment, this initial list may be in the form ofa device graph. For example, the device graph represented by the cluster402 in FIG. 4 includes three user device identifiers, user deviceidentifier A 402 a, user device identifier B 402 b and user deviceidentifier C 402 c. As discussed above, the user device identifier maybe an identifier used by a device when the device accesses content(e.g., accesses a news website). User device identifier A is a cookie IDfor a browser application on a smartphone, user device identifier B is adevice identifier for a computing device that has been hashed using, forexample, the Message Digest-5 (MD-5) hashing function/algorithm, userdevice identifier C is a device identifier for a computing device thathas been hashed using another hash algorithm, such as, for example, theSecure hash algorithm-1 (Sha-1) hashing function/algorithm. Thoseskilled in the art may implement such other algorithms with equal orsubstantially similar functionality as those hash algorithms described.

In the device graph, a user device identifier may represent a devicethat is associated with the user device identifier (e.g., may representthe device that is using the user device identifier). In one embodiment,a device may be a computing device 116 and/or an application, software,software modules, and/or other components on the computing device 116.For example, the device may be one or more of a desktop computer, alaptop computer, a server computer, a PDA, smartphone, web-enabledtelevision set, smart television set, a gaming console, a connected car,and/or any other device capable of processing, managing and/ortransmitting data. In another example, the device may be software, asoftware module, an application, and/or other component on a computingdevice.

Each device (e.g., each computing device and/or each software orapplication) may have one or more user device identifiers. For example,a smartphone (e.g., a device) may have a MAC address, a serial number(e.g., a serial number from a manufacturer), an Open DeviceIdentification Number (ODIN), a Unique Device Identifier (UDID), anOpenUDID, a Globally Unique Identifier (GUID), an IMEI number, etc.,which may each be user device identifiers. In another example,applications, software, and/or software modules may also have userdevice identifiers. For example, an application on a computing devicemay have a serial number which may be the user device identifier for theapplication. In another example, a web browser application may havecookie which includes an identifier and the identifier in the cookie(e.g., the cookie ID) may be the user device identifier for the webbrowser application. In other embodiments, user device identifiers mayinclude, but are not limited to, a MAC addresses, IMEI numbers, serialnumbers, ODINs, UDIDs, OpenUDIDs, GUIDs, cookie IDs, iOS® IDFA, anIdentifier for Vendors (IDFV), and/or any other data/information whichmay be used to identify a device (e.g., an application, software, and/ora computing device). In one embodiment, a user device identifier may bea number (e.g., 734598238742), an alphanumeric value (e.g.,A984FDSJL334), a string of characters (e.g., HZ$98!324*J), or any typeof value that may be used to identify a device (e.g., an application,software, and/or a computing device).

In one embodiment, a device (e.g., a computing, device, an application,software, a software module, etc.) may generate a user deviceidentifier. For example, when, the application (e.g., a device) isinstalled onto the computing device the application (or aninstaller/install file for the application) may generate a user deviceidentifier based on a MAC address for the computing device. In anotherexample, a computing device (e.g., a device, such as a smartphone), maygenerate a user device identifier based on other identifiers for thecomputing device (e.g., the smartphone may generate a user deviceidentifier based on an IMEI number or a UDID for the smartphone). Insome embodiments, the device (e.g., a computing device, an application,etc.) may use a variety methods, algorithms, operations, and/orfunctions, to generate user device identifiers. For example, anapplication on a computing device may use a cryptographic hash function(e.g., SHA-1, Secure Hash Algorithm-2 (SHA-2), MD-5, etc.) to generate auser device identifier for the application based on an IMEI for thecomputing device. In another example, a computing device (e.g., a tabletcomputer) may use a random number generator (e.g., a Pseudo-RandomNumber Generator (PRNG)) to generate a device identifier based on a MACaddress for the computing device.

According to an embodiment of the present invention, in addition toreceiving user identifier information, at step 302, the MatchID analysissystem 106 may receive other information associated with the pluralityof user identifiers. Such information may include, but is not limitedto, one or more attributes associated with each of the user identifiers,one or more user activity events having corresponding IP addresses andtime stamps associated with user identifiers, user agent stringsassociated with the received events, and the like. As a non-limitingexample, the user agent string may include various device specificinformation—“Mozilla/[version]([system and browserinformation])[platform]([platform details])[extension s]”.

Referring back to FIG. 3, at step 304, once the initial list of clustersis received the MatchID analysis system 106 builds a master table basedon the received device graph. Each entry in the master table includes atleast a persistent master identifier (MatchID) that uniquely identifieson the plurality of received clusters, a user identifier (User ID)contained within the cluster identified by the MatchID and one or moreattributes associated with the user identifier. Table 1 belowillustrates a simplified version of the master table that can be builtby the MatchID analysis system 106.

TABLE 1 IP MatchID User ID Addresses Model Timestamp 123 A 1.2.3.4iPhone7 2016-01-06 20:34:25 UTC 2.3.4.5 2016-01-08 18:29:25 UTC 3.4.5.62016-01-23 23:49:39 UTC 123 B 1.2.3.4 MacBook 2016-01-16 20:34:25 UTC2.3.4.5 2016-01-26 00:43:32 UTC 123 C 1.2.3.4 GalaxyTab 2016-01-2622:03:25 UTC

As noted above, generally, clusters of user identifiers within a devicegraph are not static. New user IDs may appear and others may get lostover time in various device graphs, for example, due to cookieexpiration or deletion, device ID resets and/or newly purchased devices.According to an embodiment, the MatchID analysis system 106 mayperiodically pull or receive from one or more servers 112 additionaluser identifier information as another plurality of clusters of useridentifiers, referred to hereinafter as candidate clusters, for example,in the form of another device graph (step 306). FIG. 4 is an example oftwo different clusters of user identifiers associated with two differentinstants of time, in accordance with illustrative embodiments of thepresent invention. In FIG. 4, a first cluster 402 represents a clusteralready stored in the master table. This cluster includes useridentifiers A (402 a), B (402 b) and C (402 c) may be associated with apersistent master identifier 412 uniquely identifying this cluster. Asecond cluster 404 shown in FIG. 4 represents an exemplary candidatecluster received by the MatchID analysis system 106 at step 306. Thesecond cluster 404 includes another plurality of user identifiers 404a-404 c. User identifier D 404 c may comprise an iOS® Identifier ForAdvertisers (IDFA), for example. It should be noted that clusters 402and 404 are associated with different time instants. The first cluster402 may be associated with a first time instant 408, while the secondcluster 404 may be associated with a second time instant 410. TheMatchID analysis system 106 is configured to determine whether thesecond cluster is associated with the same user as the first cluster402, and if so assign the same persistent master identifier 412 to thesecond cluster. FIG. 4 shows one candidate cluster 404 for illustrativepurposes only. At step 306, the MatchID analysis system 106 may includethousands or even millions of candidate clusters.

Referring back to FIG. 3, at step 308, the MatchID analysis system 106(e.g. clustering comparison execution module 202) may compare each ofthe plurality of candidate clusters with one or more clusters in themaster table (Table 1) to identify matching user device identifiers. Theresults of the comparison are then analyzed by the MatchID analysissystem 106 to determine which candidate clusters belong to the sameperson as clusters stored in the master table. More specifically, atstep 310, the MatchID analysis system 106 determines if any matching issuccessful. In response to determining that matching was successful(decision block 310, “Yes” branch), at step 312, the MatchID analysissystem 106 determines if there is only one candidate cluster having useridentifiers overlapping with identifiers contained in a particularcluster in the master table (step 312). If so (decision block 312, “Yes”branch), the MatchID analysis system 106 assigns the persistent masteridentifier of the matching cluster in the master table to the matchingcandidate cluster.

According to the illustrative embodiment of FIG. 5, a persistent masteridentifier may be associated with a candidate cluster in a case wherethere is only one candidate cluster with overlapping UIDs, in accordancewith illustrative embodiments of the present invention. FIG. 5 showsclusters 402 and 404 discussed above in conjunction with FIG. 4. In thisfigure, a first link 502 represents a match between the user identifierB 402 b of the first cluster 402 and the user identifier B 404 a of thesecond cluster 404, while a second link 504 represents a match betweenthe user identifier C 402 c of the first cluster 402 and the useridentifier C 404 b of the second cluster 404 if the second cluster 404is the only candidate cluster having user identifiers matching theidentifiers of the first cluster 402, then the MatchID analysis system106 decides that clusters 402 and 404 belong to the same user, assignsthe persistent master identifier 506 of the first cluster 402 to thesecond cluster 404 (step 314) and updates the master table (step 322).Table 2 below illustrates such update made to the master table (Table1):

TABLE 2 IP MatchID User ID Addresses Model Timestamp 123 D 1.2.3.4iPhone7 2016-01-07 20:45:24 UTC 3.4.5.6 2016-02-12 00:55:44 UTC 123 B1.2.3.4 MacBook 2016-01-16 20:34:25 UTC 2.3.4.5 2016-01-26 00:43:32 UTC123 C 1.2.3.4 GalaxyTab 2016-01-26 22:03:25 UTCIn other words, in this case, the entry associated with the useridentifier D (404 c) replaces the entry associated with the useridentifier A (402 a) within the first cluster 402 stored in the mastertable. It should be noted that an entry associated with the useridentifier A (402 a) may stay in the master table for a predeterminedperiod of time before being purged by the MatchID analysis system 106.

Referring back to FIG. 3, in response to determining that there is morethan one candidate cluster having overlapping user identifiers with thefirst cluster 402 (decision block 312, “No” branch), at step 316, theMatchID analysis system 106 determines whether only one candidatecluster has a highest number of user identifiers overlapping with theuser identifiers of the first cluster 402 if so (decision clock 316,“Yes” branch), the MatchID analysis system 106 assigns the persistentmaster identifier of the matching cluster in the master table to thecandidate cluster with the highest number of overlapping useridentifiers (step 318).

In the illustrative embodiment of FIG. 6, a persistent master identifiermay be associated with a candidate cluster in a case where there is morethan one candidate cluster with overlapping UIDs and only one candidatecluster having the highest number of overlapping UIDs, in accordancewith illustrative embodiments of the present invention. The firstcluster 402 includes user identifiers A (402 a), B (402 b) and C (402 c)stored in the master table. A third cluster 602 and a fourth cluster 604represent candidate clusters received by the MatchID analysis system 106in step 306 and identified as having at least one matching useridentifier. The third cluster 602 includes user identifiers A (602 a), B(602 b) and C (602 c), while the fourth cluster 604 includes only twouser identifiers—user identifier C (604 a) and user identifier E (604b). In one embodiment, the user identifier E may be a cookie ID for abrowser application on a MacBook device. However, the third cluster 602has two user identifiers 602 a and 602 b overlapping with two useridentifiers 402 a and 402 b in the first cluster 402 (overlapsrepresented by links 606 and 608), while the fourth cluster 604 includesonly one user identifier C (604 a) overlapping with one user identifierC (402 c) of the first cluster 402 (represented by the link 610). Sincethe third cluster 602 has the highest number of overlapping identifiers,the MatchID analysis system 106 decides that clusters 402 and 602 belongto the same user, assigns the persistent master identifier 412 of thefirst cluster 402 to the persistent master identifier 612 of the thirdcluster 602 (step 318) and updates the master table (step 322). Table 3below illustrates such update made to the master table (Table 1):

TABLE 3 IP MatchID User ID Addresses Model Timestamp 123 A 1.2.3.4iPhone7 2016-01-06 20:34:25 UTC 2.3.4.5 2016-01-08 18:29:25 UTC 3.4.5.62016-01-23 23:49:39 UTC 123 B 1.2.3.4 MacBook 2016-01-16 20:34:25 UTC2.3.4.5 2016-01-26 00:43:32 UTC 123 D 1.2.3.4 iPhone7 2016-01-0720:45:24 UTC 3.4.5.6 2016-02-12 00:55:44 UTCIn other words, in this case, the entry associated with the useridentifier D (404 c) replaces the entry associated with the useridentifier C (402 c) within the first cluster 402 stored in the mastertable. Furthermore, the entry associated with the user identifier C (402c) may now be associated with the fourth cluster 604.

In the illustrative embodiment of FIG. 7, a persistent master identifiermay be associated with a candidate cluster in a case where there is morethan one candidate cluster with overlapping UIDs and more than onecandidate cluster having the highest number of overlapping UIDs, inaccordance with illustrative embodiments of the present invention. Inaddition to the first cluster 402, FIG. 7 shows a fifth cluster 702 anda sixth cluster 704 representing candidate clusters received by theMatchID analysis system 106 in step 306 and identified as having atleast one matching user identifier. The fifth cluster 702 includes useridentifiers A (702 a), D (702 b) and E (702 c), while the sixth cluster704 includes identifiers C (704 a) and F (704 b). In one embodiment, theuser identifier F may be a cookie ID for a browser application on aniPhone device. In this case the fifth 702 and sixth 704 clusters havethe same number of matching identifiers (1), represented by links 706and 708 in FIG. 7.

According to an embodiment of the present invention, referring back toFIG. 3 yet again, in response to determining that there is more than onecandidate cluster having the highest number of overlapping useridentifiers with the first cluster 402 (decision block 316, “No”branch), at step 320, the MatchID analysis system 106 may employadditional heuristics information in conjunction with the user deviceidentifiers to associate two or more clusters to the same user. Suchadditional heuristics information may be advantageous in situations inwhich there is a tie between the candidate clusters having a highestnumber of matching user device identifiers (such as situation shown inFIG. 7). In an example, the heuristic information is a user identifiertype heuristic that specifies a type of user identifier included in eachcluster. In an embodiment, the MatchID analysis system 106 may employone or more weighting factors, with stronger weights being associatedwith stronger user device identifiers. For example, the MatchID analysissystem 106 may associate stronger weights with the device, applicationor system based user identifiers as compared to cookie based user deviceidentifiers.

As another heuristics information example, location information, such asGPS location information, reverse Internet Protocol (IP) addressmapping, and/or other information, may be employed as a weightingfactor, with stronger weights being associated with user devicesidentifiers of the candidate clusters that are located in the samegeneral geographic area as user device identifier of the cluster storedin the master table. Further, in cases in which such a weighting factoris low or not available (thus indicating that two separate user devicesare not physically located closely to each other), the weighting factormay not reduce the effect of user device identifiers that indicate astrong association of the two or more user devices to the same user. Insuch cases, different user devices of the same user may be purposelypositioned in different locations (e.g., due to the user travelling andleaving one of the user devices at home) or the user may employdifferent Internet services for different user devices.

In other examples, additional heuristics information may includeexplicit identifiers obtained from third-party authentication services,such as those offered by Facebook® or Google®, which may be used to morecorrectly select two or more clusters associated to the same user. In afurther example, the heuristic information may indicate the clusterhaving the highest number of user device identifiers. In an embodimentof the present invention, if none of the heuristics information helps tobreak the tie between candidate clusters, the MatchID analysis system106 may randomly select one of the candidate clusters at step 320.

If matching the obtained candidate cluster's user device identifiers torespective user device identifiers in the clusters stored in the mastertable does not yield any matching candidate clusters (decision block310, “No” branch), at step 324 the MatchID analysis system 106 comparesattributes of user device identifiers in the candidate clusters torespective attributes of user device identifiers included in theclusters stored in the master table. In an example, the attributes mayinclude the device models and visited IP addresses associated with eachuser device identifier in each cluster. Device models are typicallynames or codes that may be used as labels to distinguish one type ofdevice from another. In a non-limiting example, device models mayinclude iPhone_iOS10.3, PC_Window10.1 or GalaxyS7. The MatchID analysissystem 106 may be configured to extract device model information fromthe aforementioned user agent string. In one embodiment, the MatchIDanalysis system 106 may store both the device models and visited IPaddress information as part of an entry stored in the master table (asshown in Tables 1-3) above.

In the illustrative embodiment of FIG. 8, a persistent master identifieris associated with a candidate cluster based on, matching attributesassociated with UIDs in a case where there are no overlapping UIDs, inaccordance with illustrative embodiments of the present invention. FIG.8 shows a seventh cluster 804 and an eighth cluster 808 representingcandidate clusters received by the MatchID analysis system 106 in step306. The seventh cluster 804 includes user identifiers D 804 a and E 804b, while the eighth cluster 808 includes identifiers F (808 a), G (808b), and H (808 c). None of these identifiers match the identifiers A(402 a), B (402 b), and C (402 c) stored in the cluster 402.Furthermore, FIG. 8 illustrates various attributes associated with eachidentifier For example, the attributes 802 a-802 c are shown as part ofthe first cluster 402 and are associated with respective identifiers 402a-402 c, the attributes 806 a and 806 b are associated with theidentifiers D 804 a and E 804 b (seventh cluster 804) and the attributes810 a-810 c are associated with the identifiers F (808 a), G (808 b),and H (808 c) (eighth cluster 808).

According to an embodiment of the present invention, step 324 performedby the MatchID analysis system 106 involves comparing combinations ofattributes 806 a-806 b and 810 a-810 c of candidate clusters tocombinations of attributes 802 a-802 c of clusters stored in the mastertable. At 326, the MatchID analysis system 106 determines if a matchexists between the corresponding attributes. In certain exampleembodiments, if no match is found (decision block 326, “No” branch),then the MatchID analysis system 106 may assign a newly generatedpersistent master identifier to each candidate cluster having nomatching attributes. However, if a match is found (decision block 326,“Yes” branch), at 330, the MatchID analysis system 106 determines ifonly one candidate cluster has the highest number of matching attributesat 330. In the example shown in FIG. 8, the links 812, 814 and 816represent matches between user device identifier attributes 802 a and806 a of the first cluster 402 and user device identifier attributes 802b and 806 b of the seventh cluster 804, respectively, while the link 818shows matching user device identifier attributes 802 a of the firstcluster 402 and attributes 810 a of the eighth cluster 808. Since theseventh cluster 804 has the highest number of matching attributes, instep 332, the MatchID analysis system 106 assigns the persistent masteridentifier 822 to the seventh cluster 804.

According to some embodiments of the present invention, the attributeinformation received by the MatchID analysis system 106 in step 306 mayfurther include frequencies of occurrence for each attribute combinationin a predetermined period of time (e.g., past few days or past fewweeks). Table 4 below illustrates a master table that stores frequenciesof occurrence for each attribute combination:

TABLE 4 IP MatchID User ID Addresses Model Frequency Timestamp 123 A1.2.3.4 iPhone7 8 2016-01-06 20:34:25 UTC 2.3.4.5 7 2016-01-08 18:29:25UTC 3.4.5.6 7 2016-01-23 23:49:39 UTC 123 B 1.2.3.4 MacBook 4 2016-01-1620:34:25 UTC 2.3.4.5 9 2016-01-26 00:43:32 UTC 123 C 1.2.3.4 Galaxy Tab2 2016-01-26 22:03:25 UTC

Such additional information may be advantageous in situations in whichtwo or more candidate clusters have the same amount of matchingattribute combinations. Accordingly, in response to determining thatmore than one candidate cluster has the highest number of matchingattributes (decision block 330, “No” branch), at step 334, the MatchIDanalysis system 106 may assign the persistent master identifier to thecluster having attribute combinations that have been observed mostfrequently, which may be determined based on the stored timestampinformation. In an embodiment of the present invention, if none of theattribute information helps to break the tie between the candidateclusters, the MatchID analysis system 106 may randomly select one of thecandidate clusters at step 334.

As shown in FIG. 3, each time the MatchID analysis system 106 assigns apersistent master identifier (steps 314, 318, 320, 328, 332,334) themaster table is updated in step 322 and the MatchID analysis system 106returns to step 306 and processes next received batch of the candidateclusters.

Advantageously, the various embodiments described herein provide asystem for creating and maintaining a persistent master identifier forvolatile or distinct clusters of user identifiers that are attributed tothe same person. In the aforementioned embodiments, the system receivesclusters of user identifiers stripped of personally identifiableinformation and looks for patterns linking one user device to another.In various embodiments, user identifiers may comprise any suitableunique identifiers. Furthermore, the automated system disclosed hereinis configured to analyze the frequency of associations between useridentifier attributes to determine the individual associations.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or, semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the present invention are described above with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing, apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

The descriptions of the various embodiments of the present inventionhave been presented for purposes of illustration, but are not intendedto be exhaustive or limited to the embodiments disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the describedembodiments. The terminology used herein was chosen to best explain theprinciples of the embodiments, the practical application or technicalimprovement over technologies found in the marketplace, or to enableothers of ordinary skill in the art to understand the embodimentsdisclosed herein.

1. A computer-implemented method comprising using at least one hardwareprocessor to: store a plurality of stored clusters in a master table ina memory, wherein each stored cluster comprises a persistent masteridentifier that is associated with a plurality of user deviceidentifiers; receive, from one or more servers, one or more receivedclusters, wherein each received cluster comprises a plurality of userdevice identifiers; and, for each of the one or more received clusters,compare the plurality of user device identifiers in the received clusterwith the plurality of user device identifiers in the plurality of storedclusters to match any overlapping user device identifiers between thereceived cluster and any stored clusters, and, in response to matchingone or more overlapping user device identifiers, when all of thematching one or more overlapping user device identifiers are in only asingle stored cluster, update the master table to associate thepersistent master identifier of the single stored cluster with theplurality of user device identifiers in the received cluster, and, whenthe matching one or more overlapping user device identifiers are matchedacross two or more stored clusters, when one of the two or more storedclusters has more overlapping user device identifiers than any other ofthe two or more stored clusters, update the master table to associatethe persistent master identifier of the one stored cluster with theplurality of user device identifiers in the received cluster, and, whennone of the two or more stored clusters has more overlapping user deviceidentifiers than any other of the two or more stored clusters,  applyone or more heuristics to select one of the two or more stored clusters,and  update the master table to associate the persistent masteridentifier of the selected stored cluster with the plurality of userdevice identifiers in the received cluster.
 2. The method of claim 1,wherein applying one or more heuristics comprises weighting the matchedone or more overlapping user device identifiers differently according tothe one or more heuristics.
 3. The method of claim 2, wherein applyingone or more heuristics comprises weighting overlapping user deviceidentifiers that are based on a device identifier, applicationidentifier, or system identifier higher than overlapping user deviceidentifiers that are based on a cookie identifier.
 4. The method ofclaim 2, wherein applying one or more heuristics comprises weightingoverlapping user device identifiers that are located in a samegeographic area higher than overlapping user device identifiers that arenot located in a same geographic area.
 5. The method of claim 2, whereinapplying one or more heuristics comprises weighting a stored clusterthat comprises more user device identifiers over a stored cluster thatcomprises fewer user device identifiers.
 6. The method of claim 1,wherein applying one or more heuristics comprises, when the one or moreheuristics are unable to differentiate between the two or more storedclusters, randomly selecting one of the two or more stored clusters. 7.The method of claim 1, wherein each of the plurality of user deviceidentifiers, in each of the plurality of stored clusters and each of theone or more received clusters, is associated with one or moreattributes, and wherein the method further comprises using the at leastone hardware processor to, for each of the one or more receivedclusters, in response to not identifying any overlapping user deviceidentifiers: compare the attributes in the received cluster with theattributes in plurality of stored clusters to match any overlappingattributes between the received cluster and any stored clusters; and, inresponse to matching one or more overlapping attributes, when all of thematching one or more overlapping attributes are in only a single storedcluster, update the master table to associate the persistent masteridentifier of the single stored cluster with the plurality of userdevice identifiers in the received cluster, and, when the matching oneor more overlapping attributes are matched across two or more storedclusters, when one of the two or more stored clusters has moreoverlapping attributes than any other of the two or more storedclusters, update the master table to associate the persistent masteridentifier of the one stored cluster with the plurality of user deviceidentifiers in the received cluster, and, when none of the two or morestored clusters has more overlapping attributes than any other of thetwo or more stored clusters, select one of the two or more storedclusters based on additional criteria, and update the master table toassociate the persistent master identifier of the selected storedcluster with the plurality of user device identifiers in the receivedcluster.
 8. The method of claim 7, further comprising using the at leastone hardware processor to, for each of the one or more receivedclusters, in response to not identifying any overlapping user deviceidentifiers and any overlapping attributes: generate a new uniquepersistent master identifier; and store the received cluster as a newstored cluster in the memory, wherein the new stored cluster comprisesthe new unique persistent master identifier in association with theplurality of user device identifiers in the received cluster.
 9. Themethod of claim 7, wherein the one or more attributes comprise a devicemodel.
 10. The method of claim 7, wherein the one or more attributescomprise a visited Internet Protocol (IP) address.
 11. The method ofclaim 7, wherein the one or more attributes comprise a frequency of anoccurrence of at least one of the one or more attributes.
 12. The methodof claim 11, wherein selecting one of the two or more stored clustersbased on additional criteria comprises selecting the one of the two ormore stored clusters with a highest frequency in the one or moreattributes associated with its plurality of user device identifiers. 13.The method of claim 7, wherein the one or more attributes comprise atime of an occurrence of at least one of the one or more attributes. 14.The method of claim 13, wherein selecting one of the two or more storedclusters based on additional criteria comprises selecting the one of thetwo or more stored clusters with a most recent time in the one or moreattributes associated with its plurality of user device identifiers. 15.The method of claim 7, wherein the one or more attributes comprise auser agent string.
 16. The method of claim 7, wherein selecting one ofthe two or more stored clusters based on additional criteria comprises,when the additional criteria are unable to differentiate between the twoor more stored clusters, randomly selecting one of the two or morestored clusters.
 17. The method of claim 1, wherein the plurality ofuser device identifiers in one or more of the plurality of storedclusters or one or more received clusters comprise at least one of amedia access control (MAC) address, a serial number, an open deviceidentification number (ODIN), a unique device identifier (UDID), aglobally unique identifier (GUID), and an international mobile equipmentidentity (IMEI).
 18. The method of claim 1, wherein updating the mastertable to associate the persistent master identifier of a stored clusterwith the plurality of user device identifiers in the received clustercomprises deleting any user device identifiers in the stored clusterthat do no overlap with user device identifiers in the received cluster.19. A system comprising: a memory; at least one hardware processor; andone or more software modules that are configured to, when executed bythe at least one hardware processor, store a plurality of storedclusters in a master table in a memory, wherein each stored clustercomprises a persistent master identifier that is associated with aplurality of user device identifiers, receive, from one or more servers,one or more received clusters, wherein each received cluster comprises aplurality of user device identifiers, and, for each of the one or morereceived clusters, compare the plurality of user device identifiers inthe received cluster with the plurality of user device identifiers inthe plurality of stored clusters to match any overlapping user deviceidentifiers between the received cluster and any stored clusters, and,in response to matching one or more overlapping user device identifiers,when all of the matching one or more overlapping user device identifiersare in only a single stored cluster, update the master table toassociate the persistent master identifier of the single stored clusterwith the plurality of user device identifiers in the received cluster,and, when the matching one or more overlapping user device identifiersare matched across two or more stored clusters,  when one of the two ormore stored clusters has more overlapping user device identifiers thanany other of the two or more stored clusters, update the master table toassociate the persistent master identifier of the one stored clusterwith the plurality of user device identifiers in the received cluster,and,  when none of the two or more stored clusters has more overlappinguser device identifiers than any other of the two or more storedclusters,   apply one or more heuristics to select one of the two ormore stored clusters, and   update the master table to associate thepersistent master identifier of the selected stored cluster with theplurality of user device identifiers in the received cluster.
 20. Anon-transitory computer-readable medium having instructions storedthereon, wherein the instructions, when executed by a processor, causethe processor to: store a plurality of stored clusters in a master tablein a memory, wherein each stored cluster comprises a persistent masteridentifier that is associated with a plurality of user deviceidentifiers; receive, from one or more servers, one or more receivedclusters, wherein each received cluster comprises a plurality of userdevice identifiers; and, for each of the one or more received clusters,compare the plurality of user device identifiers in the received clusterwith the plurality of user device identifiers in the plurality of storedclusters to match any overlapping user device identifiers between thereceived cluster and any stored clusters, and, in response to matchingone or more overlapping user device identifiers, when all of thematching one or more overlapping user device identifiers are in only asingle stored cluster, update the master table to associate thepersistent master identifier of the single stored cluster with theplurality of user device identifiers in the received cluster, and, whenthe matching one or more overlapping user device identifiers are matchedacross two or more stored clusters, when one of the two or more storedclusters has more overlapping user device identifiers than any other ofthe two or more stored clusters, update the master table to associatethe persistent master identifier of the one stored cluster with theplurality of user device identifiers in the received cluster, and, whennone of the two or more stored clusters has more overlapping user deviceidentifiers than any other of the two or more stored clusters,  applyone or more heuristics to select one of the two or more stored clusters,and  update the master table to associate the persistent masteridentifier of the selected stored cluster with the plurality of userdevice identifiers in the received cluster.